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Abstract. Recent SVD-free matrix factorization formulations have enabled rank minimization for systems with millions of 
rows and columns, paving the way for matrix completion in extremely large-scale applications, such as seismic data interpolation. 

In this paper, we consider matrix completion formulations designed to hit a target data-fitting error level provided by the 
user, and propose an algorithm that is able to exploit factorized formulations to solve the corresponding optimization problem. 
Since practitioners typically have strong prior knowledge about target error level, this innovation makes it easy to apply the 
algorithm in practice, leaving only the factor rank to be determined. We explore the role that rank of the factors plays in our 
formulation, and show that rank can be easily adjusted as the inversion proceeds. 

Within the established framework, we then propose two extensions that are highly relevant to solving practical challenges 
of data interpolation. First, we propose a weighted extension that allows known subspace information to improve the results 
of matrix completion formulations. We show how this weighting can be used in the context of frequency continuation, an 
essential aspect to seismic data interpolation. Second, we extend matrix completion formulations to be extremely robust to 
large measurement errors in the available data. 

We illustrate the advantages of the basic approach on the Netflix Prize problem using the Movielens (1M) dataset. Then, 
we use the new method, along with its robust and subspace re-weighted extensions, to obtain high-quality reconstructions for 
large scale seismic interpolation problems with real data, even in the presence of extreme data contamination. 

1. Introduction. Sparsity- and rank-regularization have had tremendous impact in many areas over 
the last several decades. Sparsity in certain transform domains has been exploited to solve underdetermined 
linear systems with applications to compressed sensing [51 IE]? natural image denoising/inpainting [34^ [22 ], [24] . 
and seismic image processing [l7l[27l[T6l|25]. Analogously, low rank structure has been used to efficiently solve 
matrix completion problems, including the Netflix problem, along with many other applications, including 
control, system identification, signal processing, and combinatorial optimization [10, 29 , 7 , and seismic data 
interpolation and denoising [28] . 

Regularization formulations for both types of problems introduce a regularization functional of the 
decision variable, either by adding an explicit penalty to the data-fitting term 

mmp(A(x)-b) + \\\x\\, (QP A ) 

X 

or by imposing constraints 

mmp(A{x) - b) s.t. \\x\\ < r . (LASSO r ) 

X 

In these formulations, x may be either a matrix or a vector, || • || may be a sparsity or rank promoting penalty 
such as || • ||i or || • ||*, A may be any linear operator that predicts the observed data vector b of size p x 1, 
and p(-) is typically taken to be the 2-norm. 

These approaches require the user to provide regularization parameters whose values are typically not 
known ahead of time, and otherwise require fitting or cross-validation procedures. 

The alternate formulation 

min | \x\ | s.t. p(A(x) - b) <rj. (BPDNJ 

X 

has been successfully used for the sparse regularization of large scale systems [4 , and proposed for nuclear 
norm regularization [5]. This formulation requires the user to provide an acceptable error in the data fitting 



domain (BPDN^), and is preferable for many applications, especially when practitioners know (or are able 
to estimate) an approximate data error level. 
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A practical implementation of (BPDN^ ) for large scale matrix completion problems is difficult because of 



the tremendous size of the systems of interest, which makes SVD-based approaches intractable. Fortunately, 
a growing literature on factorization-based rank optimization approaches has enabled matrix completion 



formulations for (QPa ) and (LASSO r ) approaches for extremely large-scale systems that avoiding costly SVD 



computations [31 ] 121 ] 130] . In this paper, we extend the framework of [5] to incorporate matrix factorization 
ideas, enabling the ( jBPDN^ ) formulation for rank regularization of extremely large scale problems, such as 
seismic data interpolation. 

In factorized formulations, the user must specify a factor rank. From a computational perspective, it 
is better that the rank stay small; however if it is too small, it may be impossible to solve (BPDN^) to a 



specified error level r\. Fortunately, factor rank can be adjusted on the fly within the framework we develop. 

to be the quadratic penalty, recent extensions pQ 



While formulations in [4[ [5] choose p in (BPDN 



allow more general penalties to be used. In particular, robust convex (see e.g. [18]) and nonconvex penalties 
(see e.g. [19] [2]) can be used to measure misfit error in the (BPDN^) formulation. These extensions are 



also captured by the framework we propose for the rank optimization problem, and allow matrix completion 
problems that are robust to data contamination. 

Finally, subspace information can be used to inform the matrix completion problem, analogously to 
how partial support information can be used to improve the sparse recovery problem [12 . This idea is 
especially important for seismic interpolation, where frequency continuation is used. We show that subspace 
information can be incorporated into the proposed framework using reweighting. 

To summarize, we design factorization-based formulations and algorithms for matrix completion that 



1. Achieve a specified target misfit level provided by the user (i.e. solve (BPDN^)). 



2. Achieve recovery in spite of severe data contamination using robust cost functions p in (BPDN^ 

3. Incorporate subspace information into the inversion using re- weighting. 



The paper proceeds as follows. In section |2j we briefly discuss and compare the formulations (QP 



(LASSO r ), and (BPDN^). We also review the SPG-^i algorithm [4] to solve (BPDN^), along with recent 



extensions for ( jBPDN^ ) formulations developed in [1 . In section [3j we formulate the convex relaxation 



for the rank optimization problem, and review SVD-free factorization methods. In section [4] we extend 
analysis from [6 to characterize the relationship between local minima of rank-optimization problems and 
their factorized counterparts in a very general setting. In section [5] we propose an algorithm that combines 
matrix factorization with the approach proposed by [4] and extended in pQ . We develop the robust extensions 
in section [6] and reweighting extensions in section [7] Numerical results for both the Netfrix Prize problem 
and for seismic trace interpolation of real data are presented in section [8] 



2. Regularization formulations. Each of the three formulations ([QPa), (LASSO r ), and (BPDN^ 



controls the tradeoff between data fitting and a regularization functional using a regularization parameter. 
However, there are important differences between them. 

From an optimization perspective, most algorithms solve (QPa) or (LASSO r ), together with a contin- 
uation strategy to modify r or A, see e.g., [11] [4]. However, from a modeling perspective (BPDN^) has 
a significant advantage, since the r\ parameter can be directly interpreted as a noise ceiling. In many ap- 
plications, such as seismic data interpolation, scientists have good prior knowledge of the noise floor, or a 
threshold beyond which noise is commensurate with the data. In the absence of such knowledge, one still 
wants an algorithm that returns a reasonable solution given a fixed computational budget. 



van den Berg and Friedlander [4 proposed the SPG^i algorithm for optimizing (BPDN^) that captures 
the features discussed above. Their approach allows solving (BPDN^) using a series of inexact solutions 



to (LASSO r ). The bridge between these problems is provided by the value function 



v(r) = mm p(A(x) — b) s.t. \\x\\ < r , 



(2.1) 



where the particular choice of p(-) = || • || 2 was made in [4, 5 . The (BPDN^) problem can be solved by find 
the root of v(r) = 77 using Newton's method: 



r = r 



tj'(t) 



(2.2) 



and the quantities v(r) and v'{r) can be approximated by solving (LASSO r ) problems. The graph of v(r) is 
often called the Pareto curve. In the context of sparsity optimization, (BPDN^) and (LASSO r ) are known 



to be equivalent for certain values of parameters r and rj. Recently, these results were extended to a very 
broad class of formulations (see p~[ Theorem 2.1])[^] 



For any p, v(r) is non-increasing, since larger r allow a bigger feasible set. For any convex p in (2.1 ), v(r) 
is convex by inf-projection [32, Proposition 2.22]. When p is also different iable, it follows from [TJ Theorem 
5.2] that v(t) is different iable, with derivative given in closed form by 



v\r) = -\\A*Vp(b-Ax) 



(2.3) 



where A* is the adjoint to the operator A, 
when the norm I 



is the dual norm to || • ||, and x solves LASSO r For example, 



in (2.1) is the 1-norm, the dual norm is the infinity norm, and (2.3) evaluates to the 



maximum absolute entry of the gradient. In the matrix case 
and then 



is typically taken be the nuclear norm, 



norm 



\d is the spectral norm, so (2.3) evaluates to the maximum singular value of A*V p(r). 
To design effective optimization methods, one has to be able to evaluate v(r), and to compute the dual 
d- Evaluating v(r) requires solving a sequence of optimization problems (2.1), for the sequence 



of r given by (2.2). These problems can be solved inexactly, with increasing accuracy as the optimization 
proceeds [4]. For large scale systems, the method of choice is typically a first-order method, such as spectral 
projected gradient. Fast projection is therefore a necessary requirement for tractable implementation, since 
it is used in every iteration of every subproblem. 



With the inexact strategy, the convergence rate of the Newton iteration (2.2) may depend on the con- 

4, Theorem 3.1]. For well-conditioned problems, in practice one can 

for a given rj. As the 



ditioning of the linear operator A 

often observe only a fe w (6-10) (|LASSO r ) problems to find the solution for (BPDN 



optimization proceeds, ( |LASSO r 
previous r. 



problems for larger r warm-start from the solution corresponding to the 



3. Factorization approach to rank optimization. We now consider (BPDN^ ) in the specific context 
of rank minimization. In this setting, || • || taken to be the nuclear norm, where for a matrix X G ]R nxm 5 
11^ II* = ||cr||i, where a is the vector of singular values. The dual norm in this case is (Too, which is relatively 
easy to find for very large systems. 



Much more difficult is solving the optimization problem in (2.1). For the large system case, this would 
require projecting onto the set \\X\\^ < r, which requires repeated application of SVDs. This is not feasible 
for large systems. 

Factorization-based approaches allow matrix completion for extremely large-scale systems by avoiding 
costly SVD computations [31, 20, 30 . The main idea is to parametrize the matrix X as a product, 



X = LR 1 



(3.1) 



and to optimize over the factors L, R. If X G R nxm , then L G R nx , and R G ]R mx/ \ The decision variable 
therefore has dimension k(n + m), rather than nm; giving tremendous savings when k <C m, n. 

Aside from the considerable savings in the size of the decision variable, the factorization technique 
gives an additional advantage: it allows the use of a factorization identities to make the projection problem 
^SSO r ) trivial, entirely avoiding the SVD. 
For the nuclear norm, we have 131] 



\x\ 



inf - 

x=lr t 2 



Working with a particular representation X = LR T , therefore, guarantees that 

2 



1*11, = WLR 1 



< 



(3.2) 



(3.3) 



The nuclear norm is not the only formulation that can be factorized. [21] have recently introduced the 
max norm, which is closely related to the nuclear norm and has been successfully used for matrix completion. 



Surprisingly, convexity of misfit and regularizer is not required for the result. 
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4. General Burer-Monteiro Result. All of the algorithms we propose for matrix completion are 
based on the factorization approach described above. Even though the change of variables X = LR T makes 
the problem nonconvex, it turns out that for a surprisingly general class of problems, the change of variables 
does not introduce any extraneous local minima, and in particular any local minimum of the factorized 
problem corresponds to a local or global minimum of a related un-factorized problem. This result appeared 
in [6j Proposition 2.3] in the context of SDP; however, it holds in full generality, as the authors point out [6J 
p. 431]. 

Here, we state the result in full generality for a broad class of problems, which is general enough to 
capture all of our formulations of interest. For completeness, we provide a proof in the appendix. 
Theorem 4.1 (General Factorization Theorem). Consider an optimization problem of the form 



min f(Z) 

s.t. g%(Z) < i = 1, . . . , n 
h 3 (Z) = j = l,...,m 
rank(Z) < r, 



(4.1) 



where Z G ]R nxn is positive semidefinite, and f^g%^ hi are continuous. Using the change of variable Z = SS T , 
take S G IR nxr ; and consider the problem 

mm f(SS T ) 

s.t. gi (SS T )<0 z = l,...,n (4-2) 
hj(SS T ) =0 j = l,...,m 



Consider Z — SS T , where Z is feasible for ( |4.1[ ). Then Z is a local minimum of (4.1) if and only if S is a 



local minimum of (4.2). 



In order to apply Theorem 4.1 to the formulations of interest, we need to show that these formula- 
tions can be expressed in terms of a positive semidefinite matrix Z. The connection comes from the SDP 
characterization of the nuclear norm. 

It was shown in [29l Sec. 2] that the nuclear norm admits a semi-definite programming (SDP) for- 
mulation. Given a matrix X G ]R nxm 5 we can write ||X||* using an auxiliary matrix Z G ]^( n + m ) x ( n + m ) 



\\X\l=mm\Tr{Z) 
subject to Zi 5 2 



Z 2,l 



(4.3) 



where Zi^ is the upper right n x m block of Z, and Z2,i is the lower left m x n block. More precisely, the 
matrix Z is a symmetric positive semidefinite matrix having the structure 

LL T 



X 
RR T 



(4.4) 



where L and R have the same rank as X, and Tr(Z) 

Using this characterization, we can show that a broad class of formulations of interest in this paper are 



\L\\l 



\R\\ 2 F - 



in fact problems in the class characterized by Theorem 4.1 



Corollary 4.2 (General Matrix Lasso). Any optimization problem of the form 

min f(X) 

Ji 

S.t. ||X|L<T 



(4.5) 



has an equivalent problem in the class of problems (4.1) characterized by Theorem J±.l 



Proof Using (4.3), write (4.5) as 



mm f(K(Z)) 



s.t. 



Tv(Z) < t , 
4 



(4.6) 



where TZ(Z) extracts the upper right n x m block of Z. 
□ 



5. Factorized Pareto Algorithm. The factorized formulations in the previous section have been used 
to design several algorithms for large scale matrix completion and rank minimization [21] [30] . However, all 
of these formulations take the form (QPa ) or (LASSO r ) that have the disadvantage of having to identify the 



parameters A and r ahead of the optimization procedure. 



Instead, we propose to use the factorized formulations to solve the (BPDN^) problem by traversing 



the Pareto curve of the nuclear norm minimization problem. In particular, we integrate the factorization 
procedure into the SPG^i framework, which allows to find the minimum rank solution by solving a sequence 
of factorized (LASSO r ) subproblems (5.2). The cost of solving the factorized (LASSO r ) subproblems is 



relatively cheap and the resulting algorithm takes advantage of the inexact subproblem strategy in [4]. 
For the classic nuclear norm minimization problem, we define 



v(t) = mm\\A(X) 



b\\ 2 2 s.t. ||X|U <r 



(5.1) 



and find v(r) = r] using the iteration (2.2). 

However, rather than parameterizing our problem with A, which requires SVD for the projection, we 
use the factorization formulation, exploiting Theorem 4.1 and Corollary |4.2| When evaluating the value 
function v(r), we actually solve 



min \\A(LR J 



b\\ 



s.t. 



< T 



(5.2) 



By Theorem |4.1| and Corollary |4.2[ any local solution to this problem corresponds to a local solution 
of the true LASSO problem, subject to a rank constraint on X. For any convex p, every local minimum 
of (LASSO r ) is also a global minimum, though it is not necessarily unique. When the rank of the factors L 



and R is smaller than the rank of the optimal LASSO solution, the algorithm looks for local minima of the 
rank-constrained LASSO problem. 



The problem (5.2) is optimized using the spectral projected gradient algorithm. The gradient is easy to 



compute, and the projection requires rescaling all entries of L, R by a single value, which is fast, simple, and 
parallelizable. 

To evaluate v'(t), we use the formula (2.3), treating X — LR T as a matrix in its own right. This requires 
computing the largest singular value of 

A*(b-A{LR T )) , 



which can be done relatively quickly. 

5.1. Increasing k on the fly. An important aspect to the factorization formulation is the choice of 
rank for the factors L, R in (3.1 ). The choice of rank k directly controls the rank of X = LR T , and picking k 



too small makes it impossible to fit the data up to a target level rj. Moreover, a large enough rank guarantees 
that the factorized formulation will find a global minimum to the LASSO problem. It is therefore desirable 
to be able to dynamically increase k, especially for problems with multiscale structure (e.g. seismic data, 
where k is expected to grow with frequency). 

Fortunately, adding columns to L and R can be done on the fly, since 



[L 1} [R 



LR T + lr T 



Moreover, the proposed framework for solving (BPDN^) is fully compatible with this strategy, since the 



underlying root finding is blind to the factorization representation. Changing k only affects iteration (2.2) 
through v(t) and v'(r). 
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5.2. Computational efficiency. One way of assessing the cost of the factorized Pareto algorithm is to 



compare the computational cost per iteration of the factorization constrained LASSO subproblems (5.2) with 



that of the nuclear norm constrained LASSO subproblems (5.1). We first consider the cost for computing 
the gradient direction. While both methods must compute the action of A* on a vector, the factorized 
formulation must modify factors L, R (at a cost of 0(k(n + m)) and re-form the matrix X = LR T (at a cost 
of at most O(fcnm), for every iteration and line search evaluation. Note that when A is a sampling matrix, 
it is sufficient to form only the entries of X that are sampled by A, thus reducing the cost to O(kp), where p 
is the dimension of the measurement vector b. The sparser the sampling operator A, the greater the savings; 
however, the factorized approach incurs a cost proportional to the rank k of the factors. The standard 
approach must modify the iterate X, at a cost of O(nm), for every iteration and line search evaluation. If 
the fraction sampled is smaller than the chosen rank /c, the factorized approach is actually cheaper than the 
standard method. 

We now consider the difference in cost involved in the projection. The main benefit for the factorized 



formulation is that projection is done using the Frobenius norm formulation (5.2), and so the cost is 0(k(n + 
m)) for every projection. In contrast, any classic implementation that computes the full SVD in order 
to accomplish the projection (see e.g. [3 j) is dominated by the cost of this calculation, which is 0(nm 2 ), 
assuming without loss of generality that n > m. This cost, incurred at every projection, becomes prohibitively 
expensive for large scale computation. Finally, both standard and factorized versions of the algorithm require 
computing the maximum singular value in order to compute v'(r). 

It can be seen that for a large number of projected gradient iterations, the cost of the nuclear norm 
formulation becomes significantly larger than our factorized formulation. Moreover, based on the analysis in 
section [4] if the rank of the factors L and R is at least equal to /c, then the factorized LASSO subproblems 
are guaranteed to find a global minimum for the nuclear norm LASSO subproblems. Consequently, both 
formulations will have the same number of Pareto curve updates, since the derivates are necessarily equal at 
any global minimum whenever p is strictly conve^] Therefore, when the chosen rank of the factors L and R 
is appropriately large, our factorized formulation will not require additional Pareto curve updates compared 



to the classic nuclear norm (BPDN^) formulation. 

Note that it may also be possible to take advantage of randomization techniques as proposed in [15], 
and/or Lanczos methods, to speed up standard SVD-based approaches. However, we do not know of an 
existing algorithm for rank minimization that implements this strategy. For this reason, in section [81 we 



compare only with TFOCS [J for the least squares (BPDN^) formulation (see Table 8.3). 



6. Robust Formulations. Robust statistics [18, 26] play a crucial role in many real-world applications, 
allowing good solutions to be obtained in spite of data contamination. In the linear and nonlinear regression 
setting, the least-squares problem 

mm\\F(X)-bf 2 

corresponds to the maximum likelihood estimate of X for the statistical model 

6 = F(X) + e, (6.1) 

where e is a vector of i.i.d. Gaussian variables. Robust statistical approaches relax the Gaussian assumption, 
allowing other (heavier tailed) distributions to be used. Maximum likelihood estimates of X under these 
assumptions are more robust to data contamination. Heavy-tailed distributions, in particular the Student's t, 
yield formulations that are extremely robust to outliers [19j E] . This can be explained by their re-descending 
influence functions [26]. The relationship between densities, penalties, and influence functions is shown in 
figure [6] Assuming that e has the Student's t density leads to the maximum likelihood estimation problem 

min p(F(x) ~b):=J2 lo g(^ + (F(X)i -b % ) 2 ) (6.2) 



2 It is shown in pQ that for any differentiable convex p, the dual problem for the residual r = b — Ax has a unique solution. 
Therefore, any global minimum for ( |LASS07] > guarantees a unique residual when p is strictly convex, and the claim follows, 
since the derivative only depends on the residual. 
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Fig. 6.1. Gaussian, Laplace, and Student's t Densities, Corresponding Negative Log Likelihoods, and Influence Functions. 



where v is the Student's t degree of freedom. 

was proposed in [1 j, allowing different penalty funtionals p. The root- 



A general version of (BPDN^ 



finding procedure of [4] was extended in [1 to this more general context, and used for root finding for both 
convex and noncovex p (e.g. as in ( |6.2| )). 

The use of alternative penalties as constraints in ( BPDN^ ) does not follow directly from considering 
statistical models of the form (6.1). However, we can think about penalties p as agents who, given an 



error budget 77, distribute it between elements of the residual. The strategy that each agent p will use to 
accomplish this task can be deduced from tail features evident in Figure [6| Specifically, the cost of a large 
residual is prohibitively expensive for the least squares penalty, since its cost is commensurate with that of 
a very large number of small residuals. For example, (10a) 2 = 100a 2 ; so a residual of size 10a is worth as 
much as 100 residuals of size a to the least squares penalty. Therefore, a least squares penalty will never 
assign a single residual a relatively large value, since this would quickly use up the entire error budget. In 
contrast, |10a| = 10|a|, so a residual of size 10a is worth only 10 residuals of size a when the 1-norm penalty 
is used. This penalty is likely to grant a few relatively large errors to certain residuals, if this resulted in 
a better fit. For the penalty in (6.2), it is easy to see that the cost of a residual of size 10a can be worth 



fewer than 10 residuals of size a, and specific computations depend on v and actual size of a. A nonconvex 
penalty p, e.g. the one in (6.2), would happily let outliers have large residuals, as long as the majority of 



the remaining residuals could stay very small. 

From the discussion in the previous paragraph, it is clear that robust penalties are useful as constraints 
in (BPDN^), and can cleverly distribute the allotted error budget 77, using it for outliers while fitting good 



data. The framework proposed in this paper captures the robust extension, allowing robust data interpolation 
in situations when some of available data is heavily contaminated. To develop this extension, we follow [I] 
to define the generalized value function 



v p (r) = mm p(A(X) - b) s.t. \\X\\* < r , 



x 



(6.3) 



and find v p (r) = 77 using the iteration \2.2\ . As discussed in section |2j for any convex smooth penalty p, 

v' p (r) = -\\A*Vp(f)\\ 2 , (6.4) 

where || • H2 is the spectral norm, and f = A(X) — b for optimal solution X that achieves v p (r). For smooth 
non-convex p, e.g. (6.2), we still use (|6.4[) in iteration (2.2). 



Just as in the standard least squares case, we use the factorization formulation to avoid SVDs. Note 
that Theorem |4.1| and Corollary |4.2| hold for any choice of penalty p. When evaluating the value function 
v p (r), we actually solve 



mm p(A(LR T ) - b) s.t. - 

L,R Z 



< T . 



(6.5) 



For any smooth penalty p, including (6.2), this problem can be solved using the projected gradient method. 



7. Reweighting. Every rank-/c solution X of (BPDN^ ) lives in a lower dimensional subspace of R nxm 
spanned by the n x k row and m x k column basis vectors corresponding to the nonzero singular values of 
X. In certain situations, it is possible to estimate the row and column subspaces of the matrix X either 



from prior subspace information or by solving an initial ( BPDN^ ) problem. 
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In the case where X is a vector, it was shown that prior information on the support (nonzero entries) 
of X can be incorporated in the i\ -recovery algorithm by solving the weighted-^ minimization problem. In 
this case, the weights are applied such that solutions with large nonzero entries on the support estimate have 
a lower cost (weighted i\ norm) than solutions with large nonzeros outside of the support estimate [T2] . 

When X is a matrix, the support estimate is replaced by estimates of the row and column subspace 
bases Uo G R nxk and Vo G ]R mxfe of the largest k singular values of X. Let the matrices U G W ixk and 
V G ]R mx/c be estimates of Uo and Vb, respectively. The weighted nuclear norm minimization problem can 
then be formulated as follows: 



min\\QXW\U s.t. p(A(X) - b) < 77, 



x 



(wBPDN^) 



where Q = ujUU' + U^U 1 - , W = ujVV + V^V 1 - , and u is some constant between zero and one. Here, we 



use the notation U G 



to refer to the orthogonal complement of U in 



l , and similarly for V 



Note that matrices Q and W are invert ible, and hence the reweighed LASSO problem still fits into the 
class of problems characterized by Theorem 



4.1 



Specifically, we can write any objective f(X) subject to a 



reweighted nuclear norm constraint as 



min f(Q- l 1Z(Z)W- 1 ) 
s.t. Tr(Z) < r , 



(7.1) 



where as in Corollary |4.2[ TZ(Z) e xtracts the upper nxm block of Z. A factorization similar to (5.2 ) can then 
be formulated for the (wBPDN^) problem in order to optimize over the lower dimensional factors L G TOnx/v 
and R G 



ixk 



In particular, we can solve a sequence of (LASSO r ) problems 

1 



Li , IX 



b\\ 



s.t. 



QL 
WR 



(7.2) 



where Q and W are as defined above. Problem (7.2) can also be solved using the spectral projected gra- 
dient algorithm. However, unlike to the non- weighted formulation, the projection in this case is nontrivial. 
Fortunately, the structure of the problem allows us to find an efficient formulation for the projection operator. 

7.1. Projection onto the weighted Frobenius norm ball. The projection of a point (L,R) onto 



the weighted Frobenius norm ball \ (\QL\ 
solves 



\WR\\%) 



< r is achieved by finding the point (L, R) that 



1 

mm - 

L,R 2 



s.t. 



QL 
WR 



< r. 



The solution to the above problem is given by 

L = ((/icj 2 + 1)- 



R 



(W + i)- 



WU T - 
1 VV T - 



0- 



l)- 1 V ± V ±T ) R, 



where \x is the Lagrange multiplier that solves /(/z) < r with f{fi) given by 



1, 



-Tr 



. (llUJ 2 
LO 2 



uu 1 



1 



I) 2 

7 VV T 



1 



: U ± U ± )LL 



RR 



V( M w 2 + l) 2 (m+i) 2 
The optimal /u that solves equation ( |7.1| ) can be found using the Newton iteration 
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7.2. Traversing the Pareto curve. The design of an effective optimization method that solves 



(wBPDN,,) requires 1) evaluating problem (7.2), and 2) computing the dual of the weighted nuclear norm 
\QXW\U. 

To compute the dual of the weighted nuclear norm, we follow Theorem 5.1 of [5_ which defines the polar 
(or dual) representation of a weighted gauge function k($x) as k°(<&~ 1 x), where $ is an invertible linear 
operator. The weighted nuclear norm ||QXW||* is in fact a gauge function with invertible linear weighting 
matrices Q and W. Therefore, the dual norm is given by 

(\\Q(-)W\U) d (Z) := WQ-'ZW-'IU 

8. Numerical experiments. We test the performance of our algorithm on two example applica- 
tions. In section |8.1[ we consider the Netflix Prize problem, which is often solved using rank minimiza- 
tion [I3j HU [30] . Using this problem, we compare and discuss advantages of regularization formulations, as 
opposed to algorithms; in particular we emphasize the advantages the (BPDN^ ) formulation from a modeling 
perspective. 

In section [8^1 we apply the proposed methods and extensions to seismic trace interpolation, a key appli- 
cation in exploration geophysics [33], where rank regularization approaches have recently been successfully 
used I 



We compare with classic SVD-based approaches for ( BPDN^ ) formulation in section 



5.2.1 



results for robust completion in section 8.2.2, and present results for the weighted extension in section 



show 
81H1 

8.1. The Netflix problem. We tested the performance of our algorithm on the Movielens (1M) 
dataset, which contains 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040 Movie- 
Lens users. The ratings are on an integer scale from 1 to 5. The ratings matrix is not complete, and the 
goal is to infer the values in the unseen test set. In order to test our algorithm, we further subsampled the 



available ratings by randomly removing 50% of the known entries. We then solved the (BPDN^ ) formulation 
to complete the matrix, and compared the predicted (P) and actual (A) removed entries in order to assess 
algorithm performance. We report the signal-to- noise ratio (SNR): 



SNR = 201og 



P-A 



for different values of rj in the (BPDN^) formulation. 

Since our algorithm requires pre-defining the rank of the factors L and R [[J we perform the recovery 
with the rank k G {5, 10,30,50}. Table 8.1 shows the reconstruction SNR for each of the ranks k and for 
a relative error 77 G {0.5,0.3,0.2} (the data mismatch is reduced to a fraction 77 of the initial error). The 
last row of table [87T] shows the recovery for the unconstrained factorized formulation obtained using LSQR; 
this corresponds to 77 ~ 0. It is clear that for small fc, we get good results without regularization; however, 
as k increases, the quality of the LSQR results quickly decay. The best results are obtained by picking a 
reasonably high k and using regularization. This observation demonstrates the importance of the nuclear 
norm regularization, especially when the underlying rank of the problem is unknown. 

Table 8.2 shows the value of \\X\\* of the reconstructed signal corresponding to the settings in Table pTT] 



While the interpretation of the 77 values are straightforward (they are fractions of the initial data error) , it is 



much more difficult to predict ahead of time which value of r one may want to use when solving (LASSO 



This illustrates the modeling advantage of the (BPDN^) formulation: it requires only the simple parameter 
77. Once 77 is provided, the algorithm (not the user) will instantiate (LASSO r ) formulations, and find the 
right value r that satisfies v(r) = 77. 



3 The algorithm also requires the variables L, R to be initialized to non-zero values (unlike the classic SPG^i algorithm, 
where X = is the default initialization). We initialize them with small Gaussian random values, small enough that the initial 
r estimate (taken to be the sum of squares of all entries of L and R) is much smaller than the energy of the data. 
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Table 8.1 

Summar y of the recovery results (SNR in dB) on the Movielens (1M) data set for factor rank k and relative error level r] 
for {BPDN^}. The last row gives recovery results for the non-regularized data fitting factorized formulation solved with LSQR. 
Quality degrades with k due to overfitting for the non-regularized formulation, and improves with k when regularization is used. 
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8.2. Seismic missing-trace interpolation. In exploration seismology, large-scale data sets (often 
on the order of petabytes) must be acquired and processed in order to determine the structure of the 
subsurface. In many situations, only a subset of the complete data is acquired due to physical and/or 
budgetary constraints. Recent insights from the field of compressed sensing allow for deliberate subsampling 
of seismic wavefields in order to improve reconstruction quality and reduce acquisition costs [17] . The 
acquired subset of the complete data is often chosen by randomly subsampling a dense regular source or 
receiver grid. Interpolation algorithms are then used to reconstruct the dense regular grid in order to perform 
additional processing on the data such as removal of artifacts, improvement of spatial resolution, and key 
analysis, such as migration. 

In this section, we apply the new rank-minimization approach, along with weighted and robust exten- 
sions, to the trace interpolation problem for two different seismic acquisition examples. We first describe 
the structure of the datasets, and then present the transform we use to cast the interpolation as a rank- 
minimization problem. 

The first example is a real data example from the Gulf of Suez. Seismic data are organized into seismic 
lines, where N r receivers and N s sources are collocated in a straight line. Sources are deployed sequentially, 
and receivers record each shot record for a period of N t time samples. The Gulf of Suez data contains 
N s = 354 sources, N r = 354 receivers, and N t = 1024 with a sampling interval of 0.004s, leading to a 
shot duration of 4s and a maximum temporal frequency of 125 Hz. Most of the energy of the seismic line 
is preserved when we restrict the spectrum to the 12-60Hz frequency band. Figs. 8.1 ^a) and (b) illustrate 
the 12Hz and 60Hz frequency slices in the source-receiver domain, respectively. In order to simulate missing 
traces, we apply a subsampling mask that randomly removes 50% of the sources, resulting in the subsampled 
frequency slices illustrated in Figs. 8.1 (c) and (d). 

State of the art trace interpolation schemes transform the data into sparsifying domains, for example 
using the Fourier [33] and Curvelet [17] transforms. The underlying sparse structure of the data is then be 
exploited to recover the missing traces. The approach proposed in this paper allows us to instead exploit 
the matrix structure of seismic data, and to design formulations that can achieve trace interpolation using 
matrix-completion strategies. 

The main challenge in applying rank-minimization for seismic trace interpolation is to find a transform 
domain that satisfies the following two properties: 
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Fig. 8.1. Frequency slices of a seismic line from Gulf of Suez with 354 shots, 354 receivers. Fully data for (a) low 
frequency at 12 Hz and (b) high frequency at 60 Hz in s-r domain. 50% Subsampled data for (c) low frequency at 12 Hz and 
(d) high frequency at 60 Hz in s-r domain. Full data for (e) low frequency at 12 hz and (f) high frequency at 60 Hz in m-h 
domain. 50% subsampled data for (g) low frequency at 12 Hz and (h) high frequency at 60 Hz in m-h domain. 




Number of singular values Number of singular values Number of singular values Number of singular values 



(a) (b) (c) (d) 

Fig. 8.2. Singular value decay of fully sampled (a) low frequency slice at 12 Hz and (c) high frequency slice at 60 Hz in 
(s-r) and (m-h) domains. Singular value decay of 50% subsampled (b) low frequency slice at 12 Hz and (d) high frequency 
data at 60 Hz in (s-r) and (m-h) domains. Notice that for both high and low frequencies, decay of singular values is faster in 
the fully sampled (m-h) domain than in the fully sampled (s-r) domain, and that subsampling does not significantly change the 
decay of singular value in (s-r) domain, while it destroys fast decay of singular values in (m-h) domain. 



1. Fully sampled seismic lines have low-rank structure (quickly decaying singular values) 

2. Subsampled seismic lines have high rank (slowly decaying singular values). 

When these two properties hold, rank-penalization formulations allow the recovery of missing traces. To 
achieve these aims, we use the transformation from the source-receiver (s-r) domain to the midpoint-offset 
(m-h). The conversion from (s-r) domain to (m-h) domain is a coordinate transformation, with the midpoint 
is defined by m = |(s + r) and the half-offset is defined by h = |(s — r). This transformation is illustrated 



by transforming the 12Hz and 60Hz source-receiver domain frequency slices in Figs. 8.1 ^a) and (b) to the 
midpoint-offset domain frequency slices in Figs. |8.1[ e) and (f). The corresponding subsampled frequency 
slices in the midpoint-offset domain are shown in Figs. |8.1[ g) and (h). 

To show that the midpoint-offset transformation achieves aims 1 and 2 above, we plot the decay of 
the singular values of both the 12Hz and 60Hz frequency slices in the source-receiver domain and in the 
midpoint-offset domain in Figs. |8.2| (a) and (c). Notice that the singular values of both frequency slices 
decay faster in the midpoint-offset domain, and that the singular value decay is slower for subsampled data 
in Figs. K2\ (b) and (d). 



Let X denote the data matrix in the midpoint-offset domain and let R be the subsampling operator that 



maps Figs. |8.1| (e) and (f) to Figs. 8.1 ^g) and (h). Denote by S the transformation operator from the source- 
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receiver domain to the midpoint-offset domain. The resulting measurement operator in the midpoint-offset 
domain is then given by A = R<S H . 



We formulate and solve the matrix completion problem ( BPDN^ ) to recover a seismic line from Gulf of 
Suez in the (m-h) domain. We work with monochromatic frequency slices and adjust the rank while going 
from low to high frequency slices. We use 300 iterations of SPG^i for all frequency slices. Figures 



(b) show the recovery and error plot for the low frequency slice at 12 Hz, respectively. Figures 



8.3 



8.3 



a) and 
c) and 

(d) show the recovery and error plot for the high frequency slice at 60 Hz, respectively. Figures 8.4 shows a 
common-shot gather section after misisng-trace interpolation from Gulf of Suez data set. 

In the second acquisition example, we implement the proposed formulation on the 5D synthetic seismic 
data provided by BG. We extract a single 4D monochromatic frequency slice from this data set and perform 
the missing-trace interpolation. Each monochromatic frequency slice has 400 x 400 receivers spaced by 
25m and 68 x 68 sources spaced by 150m. To obtain the observed data, we apply sub-sampling masks 
that randomly remove 25% and 50% of the shots. In case of 4D, we have two choices of matricization, 
as shown in Figures [8^5| a,b) , where we can either place the (ReceiverX,ReceiverY) dimensions in the rows 
and (SourceX,SourceY) dimensions in the columns, or (ReceiverY,SourceY) dimensions in the rows and 
(ReceiverX,SourceX) dimensions in the columns. We observed the same behavior of singular value decay 
as shown in Figure [8^ e,f), for each of these strategies. We therefore selected the transform domain to 
be the permutation of source and receivers coordinates, where matricization of each 4D monochromatic 
frequency slices is done using (SourceX, ReceiverX) and (SourceY, ReceiverY) coordinates. We use rank 120 
for the interpolation, and run the solver for a maximum of 1000 iterations. Figures |8.6| and |8.7| show the 
interpolation results for 25% and 50% subsampling ratios, respectively. The resulting interpolations have 
low reconstruction error. 

In order to illustrate the importance of the nuclear-norm regularization, we solved the interpolation 
problem using a simple least-squares formulation on the same seismic data set from Gulf of Suez. The least 
squares problem was solved using the L, R factorization structure, thereby implicitly enforcing a rank on 



the recovered estimate (i.e, formulation (5.2) was optimized without the r-constraint). The problem was 



then solved with the factors L and R having a rank k G {5,10,20,30,40,50,80,100}. The reconstruction 



SNRs comparing the recovery for the regularized and non-regularized formulations are shown in Fig. 8.8 
The figure shows that the performance of the non-regularized approach decays with rank, due to overfitting. 
The regularized approach, in contrast, obtains better recovery as the factor rank increases. 

8.2.1. Comparison with classical nuclear- norm formulation. To illustrate the advantage of pro- 
posed matrix-factorization formulation (which we refer to as LR below) over classical nuclear-norm formu- 
lation, we compare the reconstruction error and computation time with the existing techniques. The most 



natural baseline is the SPG^i algorithm [5 applied to the classic nuclear norm (BPDN^ ) formulation, where 
the decision variable is X, the penalty function p is the 2-norm, and the projection is done using the SVD. 
This example tests the classic formulation against LR, in the same algorithmic framework. The second 



comparison is with the TFOCS [3 algorithm, which can also solve the (BPDN^) formulation. 

The comparisons are done using two different data sets. In the first example, we generated a rank 20 
matrix of size 100 x 100. We subsampled the matrix by randomly removing 50% of the data entries. The 



synthetic low-rank example in table 8.3 shows the comparison of SNR and computational time. The rank 
of the factors was set to 20, the true rank of the data matrix, in this experiment. Both the classic SPG^i 
algorithm and the LR reformulation are faster than TFOCS. The speed advantage increases as the error 
threshold r] decreases. LR is faster than the classic SPG^i, and this advantage also increases as the data 
is required to be fit more tightly. The standard SPG£\ formulation and TFOCS produce roughly the same 
quality of result for each iteration. LR gives uniformly better SNR results, and the improvement is very 
significant for lower error thresholds. The high quality of these results are due to the fact that the rank of 
the factors was set to the true rank of the data matrhjfj 

In the second example, we interpolated missing traces of a monochromatic frequency slice (of size 354 x 
354), extracted from Gulf of Suez data set. We subsampled the frequency slice by randomly removing the 



4 We tested this hypothesis by re-running the experiment with higher factor rank. For example, selecting factor rank to be 
40 gives SNRs of 15.7, 28.5, 36.7, 44.1 for the corresponding 77 values for the synthetic low-rank experiment in |8.3| These values 
are lower than when the true rank is used, but still uniformly better than those returned by SPG^i, as we would expect. 
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50% of shots and performed the missing-trace interpolation in (m-h) domain. We compares the SNR and 
computation time for a fixed set of iterations and rj. The rank of the factors was set to 35. The seismic 



example in table 8.3 shows the results. Once again, both the classic SPG^i algorithm and LR are faster 
than TFOCS. Moreover, on this larger example LR is also significantly faster than SPG^i. Note that our 
implementation of LR does not take advantage of the clever residual formation idea discussed in section [5^21 
since we form the full matrix LR T each time to compute the residual. Implementing a routine that only 
computes the entries required to compare with b would introduce additional savings (in this case, we would 
expect the LR formulation to be twice as fast). In the quality of recovery, both SPG^i and LR have better 
SNR than TFOCS. LR achieves lower SNR than SPGf i for high error thresholds, but better results for lower 
thresholds. 

Table 8.3 

TFOCS versus classic SPG£\ versus LR factorization. Synthetic low rank example shows results for completing a rank 
20 100 x 100 matrix, with 50% missing entries. Computational time and SNR are shown for n = 0.1, 0.01, 0.001, 0.0001. Rank 
of the factors is taken to be 20. Seismic example shows results for matrix completion a low-frequency slice at 10 Hz, extracted 
from the Gulf of Suez data set, with 50% missing entries. Computational time and SNR are shown for n = 0.2, 0.1, 0.05, 0.04. 
Rank of factors was taken to be 35. 
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8.2.2. Simultaneous missing-trace interpolation and denoising. To illustrate the utility of ro- 
bust cost functions, we consider a situation where observed data are heavily contaminated. The goal here 
is to simultaneously denoise interpolate the data. We work with same seismic line from Gulf of Suez. To 
obtain the observed data, we apply a sub-sampling mask that randomly removes 50% of the shots, and to 
simulate contamination, we replace another 10% of the shots with large random errors, whose amplitudes are 
three times the maximum amplitude present in the data. In this example, we formulate and solve the robust 
matrix completion problem (BPDN^ ), where the cost p is taken to be the penalty (6.2); see section [6] for the 



explanation and motivation. As in the previous examples, the recovery is done in the (m-h) domain. We 
implement the formulation in the frequency domain, where we work with monochromatic frequency slices, 
and adjust the rank and v parameter while going from low to high frequency slices. Figure |8.9| compares 
the recovery results with and without using a robust penalty function. The error budget plays a significant 
role in this example, and we standardized the problems by setting the relative error to be 20% of the initial 
error, so that the formulations are comparable. 

We can clearly see that the standard least squares formulation is unable to recover the true solution. 
The intuitive reason is that the least squares penalty is simply unable to budget large errors to what should 
be the outlying residuals. The Student's t penalty, in contrast, achieves a good recovery in this extreme 
situation, with an SNR of 17.9 DB. In this example, we used 300 iterations of SPG^i for all frequency slices. 

8.2.3. Re- Weighting. Re-weighting for seismic trace interpolation was recently used in [23] to improve 
the interpolation of subsampled seismic traces in the context of sparsity promotion in the curvelet domain. 
The weighted i\ formulation takes advantage of curvelet support overlap across adjacent frequency slices. 

to 



Analogously, in the matrix setting, we use the weighted rank- minimization formulation (wBPDN^ 



take advantage of correlated row and column subspaces for adjacent frequency slices. We first demonstrate 
the effectiveness of solving the (wBPDN^) problem when we have accurate subspace information. For this 



purpose, we compute the row and column subspace bases of the fully sampled low frequency (11Hz) seismic 
slice and pass this information to (wBPDN^) using matrices Q and W. Figures 8.10 ^a) and (b) show the 
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Fig. 8.3. Recovery results for 50% subsampled 2D frequency slices using the nuclear norm formulation, (a) Interpolation 
and (b) residual of low frequency slice at 12 Hz with SNR = 19.1 dB. (c) Interpolation and (d) residual of high frequency slice 
at 60 Hz with SNR = 15.2 dB. 



we apply the (wBPDN^ 
ahead of time 



residual of the frequency slice with and without weighting. The reconstruction using the (wBPDN^ ) problem 
achieves a 1.5dB improvement in SNR over the non- weighted (BPDN^) formulation. 
Next, 



formulation in a practical setting where we do not know subspace bases 
but learn them as we proceed from low to high frequencies. We use the row and column 

for 10.75 Hz and 15.75 Hz frequency slices as subspace estimates 

formulation in this way 



subspace vectors recovered using (BPDN 



for the adjacent higher frequency slices at 11 Hz and 16 Hz. Using the (wBPDN^ 



yields SNR improvements of 0.6dB and ldB, respectively, over (BPDN^) alone. Figures 8.11 'a) and (b) 



show the residual for the next higher frequency without using the support and Figures 8.11 ^c) and (d) shows 



the residual for next higher frequency with support from previous frequency. Figure |8.12| shows the recovery 
SNR versus frequency for weighted and non- weighted cases for a range of frequencies from 9 Hz to 17 Hz. 



9. 



Conclusions. We have presented a new method for matrix completion. Our method combines the 

formulations with SVD-free matrix factorization methods. 

formulation on the Netflix Prize problem, 



Pareto curve approach for optimizing (BPDN^ 



We demonstrated the modeling advantages of the (BPDN^ 



and obtained high-quality reconstruction results for the seismic trace interpolation problem. Comparison 
with state of the art methods for the ( BPDN^ ) formulation showed that the factorized formulation is faster 
than both TFOCS and classic SPG^i formulations that rely on the SVD. 

We also proposed two extensions. First, using robust penalties p in (BPDN^), we showed that simul- 
taneous interpolation and denoising can be achieved in the extreme data contamination case, where 10% of 
the data was replaced by large outliers. Second, we proposed a weighted extension (wBPDN^), and used it 



to incorporate subspace information we learned on the fly to improve interpolation in adjacent frequencies. 
10. Appendix. 



Proof of Theorem J±.l, Recall [6, Lemma 2.1]: if SS T = KK T ', then S = KQ for some orthogonal matrix 
Q G M rxr . Next, note that the objective and constraints of (4.2) are given in terms of SS T , and for any 
orthogonal Q G R rxr , we have SQQ T S T = SS T , so S is a local minimum of (4.2) if and only if SQ is a 
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Fig. 8.4. Missing trace interpolation of a seismic line from Gulf of Suez, (a) Ground truth, b) 50% subsampled common 
shot gather, (c) Recovery result with a SNR of 18.5 dB. (d) Residual. 



local minimum for all orthogonal Q G R rxr . 

If Z is a local minimum of ( |4.1| ), then any factor S with Z = SS T is a local minimum of (4.2 ). Otherwise, 
we can find a better solution £ in the neighborhood of 5, and then Z := SS T will be a feasible solution 
for (4.1) in the neighborhood of Z (by continuity of the map S — >• SS T ). 

We prove the other direction by contrapositive. If Z is not a local minimum for (4.1), then you can find 
a sequence of feasible solutions Z^ with f(Z k ) < f(Z) and Z^ 
Zk are all feasible for (4.1), so Sk are feasible for ( |4.2 
can therefore find a subsequence of Sj — » S with SS 1 = Z, and f(SjSj 
Z = SS T = SS T , and S is not a local minimum for (4.2), and therefore (by previous results) S cannot be 
either. 



Z. For each /c, write Z^ = S^Sj, . Since 
By assumption {Z^} is bounded, and so is we 
-+ S with SS^= Z, and f(SjSj) < f(SS T ). In particular, we have 
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(d) (e) (f) 

Fig. 8.5. Matricization of J^D monochromatic frequency slice. Top: (SourceX,SourceY) matricization. Bottom: 
(SourceX,ReceiverX) matricization. Left: Fully sampled data; Middle: Subsampled data; Right: Singular value decay. 




Fig. 8.6. Missing-trace interpolation of a monochromatic frequency slice extracted from 5D data set, subsampling ratio 
= 25%. (a,b,c) Original, recovery and residual of a common shot gather with a SNR of 24-8 dB at the location where shot is 
recorded. (e,f,g) Interpolation of common shot gathers at the location where no reference shot is present. 
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Fig. 8.7. Missing-trace interpolation of a monochromatic frequency slice extracted from 5D data set, subsampling ratio 
= 50%. (a,b,c) Original, recovery and residual of a common shot gather with a SNR of 25.3 dB at the location where shot is 
recorded. (e,f,g) Interpolation of common shot gathers at the location where no reference shot is present. 




Fig. 8.8. Comparison of regularized and non-regularized formulations . SNR of (a) low frequency slice at 12 Hz and (b) 
high frequency slice at 60 Hz over a range of factor ranks. Without regularization, recovery quality decays with factor rank due 
to over-fiting; the regularized formulation improves with higher factor rank. 
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Fig. 8.9. Comparison of interpolation and denoising results for the Student's t and least-squares misfit function, (a) 50% 
subsampled common receiver gather with another 10 % of the shots replaced by large errors, (b) Recovery result using the 
least-squares misfit function. (c,d) Recovery and residual results using the student's t misfit function with a SNR of 17.2 dB. 
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Fig. 8.10. Residual error for recovery of 11 Hz slice (a) without weighting and (d) with weighting using true support. 
SNR in this case is improved by 1.5 dB. 
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Fig. 8.11. Residual of low frequency slice at 11 Hz (a) without weighing (c) with support from 10.75 Hz frequency slice. 
SNR is improved by 0.6 dB. Residual of low frequency slice at 16 Hz (b) without weighing (d) with support from 15.75 Hz 
frequency slice. SNR is improved by ldB. Weighting using learned support is able to improve on the unweighted interpolation 
results. 
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Fig. 8.12. Recovery results of practical scenario in case of weighted factorized formulation over a frequency range of 9-17 
Hz. The weighted formulation outperforms the non-weighted for higher frequencies. For some frequency slices, the performance 
of the non-weighted algorithm is better, because the weighted algorithm can be negatively affected when the subspaces are less 
correlated. 
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